Powerful knockoffs via minimizing reconstructability

نویسندگان

چکیده

Model-X knockoffs (J. R. Stat. Soc. Ser. B. Methodol. 80 (2018) 551–577) allows analysts to perform feature selection using almost any machine learning algorithm while provably controlling the expected proportion of false discoveries. This procedure involves constructing synthetic variables, called knockoffs, which effectively act as controls during selection. The gold standard for has been minimize mean absolute correlation (MAC) between features and their but, surprisingly, we prove this can be powerless in extremely easy settings, including Gaussian linear models with correlated exchangeable features. key problem is that minimizing MAC creates joint dependencies allow algorithms reconstruct effect on response knockoffs. To improve power, propose generating reconstructability (MRC) features, demonstrate our proposal by showing it computationally efficient, robust, powerful. We also certain MRC a notion estimation error models. Through extensive simulations, show often dramatically outperform MAC-minimizing find no settings more than slight margin. implement methods many others from literature new python package knockpy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Familywise Error Rate Control via Knockoffs

We present a novel method for controlling the k-familywise error rate (k-FWER) in the linear regression setting using the knockoffs framework first introduced by Barber and Candès. Our procedure, which we also refer to as knockoffs, can be applied with any design matrix with at least as many observations as variables, and does not require knowing the noise variance. Unlike other multiple testin...

متن کامل

Familywise Error Rate Control via Knockoffs

We present a novel method for controlling the k-familywise error rate (k-FWER) in the linear regression setting using the knockoffs framework first introduced by Barber and Candès. Our procedure, which we also refer to as knockoffs, can be applied with any design matrix with at least as many observations as variables, and does not require knowing the noise variance. Unlike other multiple testin...

متن کامل

Robust inference with knockoffs

We consider the variable selection problem, which seeks to identify important variables influencing a response Y out of many candidate features X1, . . . , Xp. We wish to do so while offering finite-sample guarantees about the fraction of false positives—selected variables Xj that in fact have no effect on Y after the other features are known. When the number of features p is large (perhaps eve...

متن کامل

State-based Reconstructability Analysis

Reconstructability analysis (RA) is a method for detecting and analyzing the structure of multivariate categorical data. While Jones and his colleagues extended the original variable-based formulation of RA to encompass models defined in terms of system states, their focus was the analysis and approximation of real-valued functions. In this paper, we separate two ideas that Jones had merged tog...

متن کامل

Reconstructability analysis of epistasis.

The literature on epistasis describes various methods to detect epistatic interactions and to classify different types of epistasis. Reconstructability analysis (RA) has recently been used to detect epistasis in genomic data. This paper shows that RA offers a classification of types of epistasis at three levels of resolution (variable-based models without loops, variable-based models with loops...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Annals of Statistics

سال: 2022

ISSN: ['0090-5364', '2168-8966']

DOI: https://doi.org/10.1214/21-aos2104